perm filename MACHIN[BOO,JMC]2 blob
sn#525095 filedate 1980-07-22 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00012 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 .if false then begin "notes"
C00004 00003 .s(MACHIN,HOW LISP WORKS)
C00008 00004 .ss(lispobj,LISP objects.)
C00014 00005 .bb Symbolic Atoms: property lists
C00039 00006 .bb List structure.
C00049 00007 .ss(top,The Top level of LISP.)
C00078 00008 .ss(edit,Editing LISP programs.)
C00099 00009 .ss(eror,Error Handling.)
C00111 00010 .ss(debbug,Debugging Aids in LISP.)
C00124 00011
C00125 00012 .ss(machinex,Exercises.)
C00128 ENDMK
C⊗;
.if false then begin "notes"
reference to LISP machine and its manual
need some good simple oblist atom-making and plist exercises.
Editor section: remark that editor works on lists (programs)
not on general S-exps
Note comparison of editor primitives (1 2 ...) and lisp primitives
(car cadr ...) . Why not cdr in editor?
Example of interactive program for executing immediate commands
Differ from normal programming languages
how much of difference is traditional, how much due to difference
in function
<< write program to implement deutsch algorthimn interms of primitives
⊗mark, ⊗markedp (using rplacs, eq etc.)>>
need good ref to discussion of funarg problem
need concrete example
.end "notes"
.if false then begin "read=print inverse"
jmc says this statement can't be proved. what is wrong???
⊗read and ⊗prina or ⊗prinb are inverses of one another modulo conversion
of external representations. Thus ⊗⊗read[prina[e,_qNIL]]_=_e⊗,
⊗⊗read[prinb[e,_qNIL]]_=_e⊗, and if ⊗u is a "dot" representation then
⊗⊗prina[read[u],_qNIL]_=_u⊗, while if ⊗u is a "list" representation then
⊗⊗prinb[read[u],_qNIL]_=_u⊗.
.end "read=print inverse"
.s(MACHIN,HOW LISP WORKS)
.if NOTASSEMBLING then start
.SEARCH: NEXT SECTION!;
.EXTEND: NEXT SECTION!;
.ABSNTX: NEXT SECTION!;
.COMPIL: NEXT SECTION!;
.COMPUT: NEXT SECTION!;
.end
The purpose of this chapter is to give you an idea of what it takes
to make LISP actually run on a computer. It is not intended as a manual
for any particular system although hopefully an understanding of the
basic components of LISP as a interactive programming system will
help you understand and use any particular LISP system.
We will discuss components common to essentially all LISP systems.
There are some hints as to the variety of possible implementations.
The organization of the chapter is "bottom up". We begin with
as discussion of LISP objects -- the basic data structures understood
and manipulated by a LISP system.
Next we describe the three main components -- ⊗read, ⊗eval and ⊗print --
which make up the "Top-level" of LISP.
This "read-eval-print loop" is generally what you
talk to when running LISP interactively.
In principle this is all that is needed.
However, in order to interact productively with a programming system
we require of it more than just the ability to run programs.
Most LISP systems have facilities for
making output more readable -- known as pretty-printing,
for editing programs,
for detecting and reporting errors,
for aiding in the debugging process
and for interacting with the host system -- external file manipulation for
example.
The ability to run programs interpretively and the very simple syntax of
LISP make it possible to have very elegant and flexible editing, error-handling,
and debugging facilities as an integral part of the system.
We present a simple editor based on the MACLISP editor and
indicate several possible extensions. We discuss how tracing and
stepping through the execution of a program can be done in LISP.
Finally there is a brief section on I/O. This aspect of LISP
(or any system) is necessarily implementation dependent. However
there are some basic operations common to most LISPs and we
present (an abstraction of) them.
[Remark: in the description of the basic LISP structures, we refer
occasionaly to programs which aren't directly available to the user.
They may or may not be explicitly implemented. Mainly, they are
convenient for purposes of explanation.]
.ss(lispobj,LISP objects.)
A LISP system deals with a variety of objects in the process
of representing S-expressions as list-structures
and carrying out computations on these structures. These include
pnames (corresponding to the character string representation of atoms)
numeric and symbolic atoms, oblists (LISPs version of a symbol table)
and pointers of various flavors. In this section we will
discuss the construction and basic operations on these objects.
.bb Pnames.
Each atom has a print name (pname for short) which is
the link between the internal representation of atoms and the
external representation as a character string.
The key component of the LISP read program is ⊗readatom which
converts the character string representing an atom externally
to the corresponding pname. The pname is used to determine
the internal representation of the atom.
Correspondingly, the key component of the LISP print program
is ⊗printatom which converts the pname to the corresponding external
(character string) representation. Clearly if the mapping between
representations is to be correct we must have, for atoms ⊗a and
character strings ⊗c,
⊗⊗readatom[printatom[a]]=a⊗ and ⊗⊗printatom[readatom[c]]⊗.
Generally pnames are not directly accessible to the user.
They are however, important properties of atoms and
several of the basic operations are conviently described using
pnames as intermediate objects.
[Remark: ⊗readatom and ⊗printatom are not programs typically available
directly to the user. See remark at the end of the introduction to
this chapter.]
.bb Numeric Atoms.
LISP distinguishes two kinds of atoms, numeric and symbolic.
Numeric atoms are generally assumed to denote particular numbers
and can be recognized, represented and printed without a lot of
additional information. Thus, although numbers are classed as atoms
(e.g. qat 1 is true)
the representation of numbers in LISP is usually different than that of
symbolic atoms. Numbers are not entered on the oblist and they do not have
property lists. Their pnames and "values" are computed from the representation
and cannot be accessed in the usual way. In particular,
you cannot set the "value" of a numeric atom
(which is probably just as well) so they really
are constants. One possible representation of numbers is by
a cell the a-part of which says this is an atom, and d-part which
is a pair consisting of the type and a pointer to actual machine
representation of the numerical value. Sufficiently small integers
can be represented more compactly by a cell the a-part of which is
a special flag and the d-part of which is a machine representation of
the numerical value. Another possibility is to store all numbers of
a particular kind in a particular area and represent a number by a
pointer directly to the machine representation of its value.
As we mentioned in Chapter {SECTION READIN}, LISP can treat arbitrarily
large integers by representing them as a list "digits" with respect
to some large base. Typically the first element of the list is
a flag saying "I am a bignum" and a sign.
In any case the reading, printing, and other primitive programs of LISP
must pay attention to whether or not the object being dealt with
is a symbolic atom or a numerical atom or some non-atomic object.
.bb Symbolic Atoms: property lists
Symbolic atoms do not in general have a meaning fixed by the system.
They can be names of variables, programs, macros, or arrays and as such
must be capable of having values, program code (compiled or in
S-expression form), macro definitions, and array descriptions (pointers)
associated with them. In addition arbitrary user defined properties
can be associated with an atom.
In the simplest case LISP treats all properties of an atom
(including its pname) uniformly by putting them on a property list.
In this case the atom is represented by a pointer to the property
list and all properties can be accessed and set uniformly by
special operations on property lists. In order to distuingish
an atom (e.g. a property list) from other list-structure, the a-part
of the first element contains data that is recognized as an
"I am an atom" flag.
This uniformness is simple and elegant.
Information stored on a property list can be retrieved for a particular atom
in a time that depends on the length of the property list of the atom,
and this is usually quite short. The time is independent of the
number of atoms.
There is some inherent inefficiency when frequently used properties, like value,
are not accessed directly but must be looked up like user defined properties.
Also the uniformity allows the programmer to manipulate the
state of the system in arbitrarily complicated and unmanagable ways.
Another possible representation of an atom is
a pointer to a block of memory with system properties such as pname,
value, (possibly) program descriptions, and a pointer to the
list for user defined properties stored in fixed locations
relative the block pointer. There may also be restricted access
to some of the system properties, thus imposing a small amount of
discipline on the programmer.
Property lists are usually realized as ordinary lists.
Each "property" has an atom as its name, and the name of the
property is ordinarily followed by a pointer to the information
in question. Because this information can be a string of characters,
a floating point number or a pointer to a subroutine, the property
list is ordinarily not a proper LISP list, and programs that operate
on proper lists can cause errors if applied to property lists.
For example, a program that interpreted a floating point number
as a pair of pointers would interpret random places in memory as
list structure.
For this reason, property lists are usually read and modified
by special programs. We have already seen two such programs, namely
$DEFUN and $DEFPROP which can be used to put function definitions on
property lists. The three basic programs on property lists are:
.begin nofill turnon "∂"
.n←24
∂(n)($GET <atom> <property>)
∂(n)($PUTPROP <atom> <value> <property>)
∂(n)($REMPROP <atom> <property>)
.end
where <atom> should evaluate to an atom, <property> must evaluate to an atom
which names the property, and <value> can be any LISP expression that can be
evaluated. [In some LISPs a property list is also an accepted value for <atom>.]
The exact form of these terms is implementation dependent, in particular
the order in which the arguments are expected may vary.
$GET returns the <property> value (if any) associated with <atom>.
$PUTPROP puts a property name <property> followed by the property value <value>
on the property list of <atom>, and $REMPROP removes a property.
$DEFPROP is like $PUTPROP except that the arguments are not evaluated.
Thus if $$(PUTPROP_'C_12_'AT-WT)$ is executed then
$$(GET_'C_'AT-WT)$ will return $12. If $$(REMPROP_'C_'AT-WT)$ is executed then
$$(GET_'C_'AT-WT)$ will return qNIL as the $AT-WT property is no longer on the
property list of $C. If $$(DEFPROP_APPLE_RED_COLOR)$ is executed then
$$(GET_'APPLE_'COLOR)$ will return $RED. Thus $DEFPROP can be used to put
arbitrary properties on property lists. The difference between $DEFPROP
and $PUTPROP is that $DEFPROP does not evaluate its arguments, but $PUTPROP does.
Most LISPs allow you to examine the property list of an atom directly.
In the case where an atom is represented by a pointer to its property list,
the ⊗car is and "atom-header and the ⊗cdr is a list of alternating
property names and property values. In the case where an atom is represented
by a pointer to a block of memory a special operation is needed to get at
the property list. For example in MACLISP there $$(PLIST 'A)$ is the
property list of the atom $A.
From a mathematical point of view $PUTPROP acts like an
assignment statement and $REMPROP like a special kind of an
assignment statement. $GET gets the value of a variable.
However, the mathematical properties of these LISP operations
haven't been systematically studied.
The particular collection of properties used by LISP varies,
however, it usually includes $VALUE and $PNAME as well as properties
whose values describe "applicative" objects such as programs, macros
and arrays. Names for the latter properties inclued
$EXPR, $FEXPR, $LEXPR, $SUBR, $FSUBR, $LSUBR, $MACRO and $ARRAY.
We will indicate how these properties are used when we discuss the
⊗eval component of LISPs Top-level.
One use of property lists in programming is in the
creation and modification of networks of data. Values attached to
atoms point to expressions that include other atoms.
A program manipulating chemical structures might use property lists
to associate such information as valence, atomic weight, nuclear spin, etc.
with each chemical atom of interest. It could also treat fragments as
quasi atoms (chemical atoms, that is) by naming them,
putting the structure of the fragment on
the property list, as well as analogues to other atomic properties.
These properties can then be used as required by programs for building
or analysing complex molecules.
Another example is the program used to print the LISP programs appearing
in this book in external form. Each special LISP atom (such as
$CAR, $COND, $APPEND, $QUOTE, ...) has on its property list the name
of the program used to compile the construct it signals, type
information, external print name (so that $CAR prints as qqa for example)
and other control information. This is an example of "data driven"
programming. Another example of a data driven program would be a compiler
that looked up the program to compile special constructs on the $COMPFN property
of the atom signalling that construct. Such a compiler is quite flexible and
allows you to extend the language easily. Another example of this style
of program is the ⊗editor given in Chapter {section MACHIN}.
.bb Symbolic atoms: Oblists.
LISP keeps track of atoms explicitly known by name by making
entries in the "oblist" (called obarray in some implementations).
The oblist is used mainly by the read program to look up atoms
having a given pname. If an atom is known (has been mentioned by name
previously and not explicitly forgotten) then a pointer to the
atom can be obtained by looking up the oblist entry corresponding to
its pname. Otherwise an atom having that pname must be created and
entered in the oblist. This protocol insures that separate mention of
atoms having the same external representation are represented by the
same internal structure.
The oblist could be an ordinary list manipulated by ⊗car, ⊗cdr,
⊗cons and even ⊗⊗rplac⊗'s. Some implementations
use arrays rather than lists to store the oblist, hence the name "obarray".
Often the oblist is "hashed" in order to speed up the search for a particular
entry. The idea of a hashed list (or array) is that it is divided up into
a fixed number ⊗n of sublists called "hash buckets", and there is a program
⊗hash which computes a number between 1 and ⊗n for each possible element of
the list. If ⊗hash[e] is ⊗k then ⊗e is in the ⊗⊗k⊗th bucket and the search for
⊗e can be restricted to searching that bucket rather than the whole list.
In the case of oblists the argument to ⊗hash is a pname. If the hashed
list is to be an improvement over a simple linear list as far as searching
efficiency is concerned, the hash program should distribute the elements of
the list evenly among the buckets. This is not always easy to do. For a further
discussion of hashing as a general searching method see Knuth[1968b].
There are two basic operations for modifying an oblist, ⊗intern and ⊗remob.
⊗intern takes an atomic object (a pointer to it) and returns a pointer
to an atom with the same pname which is a member of the oblist.
If no atom with the same pname
is on the oblist then ⊗intern puts the given atom there, and the returned
pointer will thus be equal to the input pointer. If an atom of the
same name is already on the oblist, then the pointer to that atom is
returned and it may or may not be equal to the input pointer.
(A cleaner explanation of ⊗intern would require an oblist as an argument
also. In LISP many operations use some part of the LISP environment
as implicit arguments. Of course they also may update the environment
when executed.)__
⊗remob is the atom forgetting program. It takes an atomic object
as an argument and removes it from the oblist. ⊗remob doesn't
destroy the atom, it just can no longer be referenced by name.
⊗remob and ⊗intern are usually available directly to the user.
As with any destructive operation the use of ⊗remob can get you into trouble if
you are not careful, and are not quite sure of the state of the world when you
do it.
The basic primitive for creating atoms is ⊗mkatom which takes
a pname and returns an atom with that pname and with no other properties
set. ⊗mkatom is not directly available to the user, but is used
(possibly implicitly) by other system programs that manipulate atoms.
.bb Making atoms: Examples.
LISP makes available to the programmer several operations
on atoms and oblists. These include ⊗maknam, ⊗implode and ⊗gensym for
making atoms, ⊗explode for analysing pnames and ⊗intern and ⊗remob for
updating the oblist. The oblist operations were described above.
(You should consult the manual for the LISP you are using, for details
about the precise names and conventions for using these programs.)
In the following we attempt to explain the subtilties and uses of the
programs for making atoms.
First we consider why the ability to make atoms (other than
as needed when the are read) is useful. In some sense the usefulness
of this and other features is limited only by the imagination of the programmer.
One class of programs that use the atom making ability are programs
that need to invent labels for some reason. Compilers are in this class.
The typical output of a compiler is a list of symbolic assembly instructions
some of which are labeled with symbolic addresses which are referenced
in other instructions.
Other members of the class include program generating and program transforming
programs that need to introduce new variable names, statement labels, etc..
Another example is a program that uses temporary files as
work space and the number of files needed is not known in advance.
Therefore it is convenient to be able to generate file names.
Finally in a system of programs for manipulating logical formulae or
λ-expressions a program that substitutes an expression for
occurrences of a bound variable generally needs to be able to
rename bound variables in order to avoid conflict. This is most
conveniently done by providing a means of generating new variable names
as the need arises.
The operation ⊗maknam takes a list of characters (atoms whose
pname is a single character) and returns an atom whose pname is
the concatenation of this list of characters. It does not enter
the atom on the oblist, or look to see if an atom of the same pname
is already known. Thus it always produces an atom that is not
qeq any already existing atom.
⊗gensym behaves similarly, but it takes no arguments.
It determines the pname from a prefix and
a count stored in some gobal location. The usual prefix is $G
and the resulting pname has the form $Gdddd where each $d is a decimal digit.
Each time ⊗gensym is evaluated the count is incremented by 1.
Thus if we type $(GENSYM) to LISP and it types back $GOO69 then
typing $(GENSYM) again will result in $G0070 being typed back.
(No prediction is made as to what will happen when the symbol reaches $$G9999$).
(LISP implementations usually provide the user a mechanism for resetting
the $GENSYM count and prefix character(s).)
Note the non-functional aspect of ⊗gensym. Namely,
⊗gensym[]≠gensym[]. (Try typing $$(EQ (GENSYM) (GENSYM))$ to LISP.)
⊗gensym is useful in places where you need a label,
but don't care what it is called.
⊗implode, like ⊗maknam takes a list of characters and returns
an atom whose pname is the concatenation of the characters in the list.
However, unlike ⊗maknam it first looks to see if an atom with that pname
already known (in the oblist) and if so returns that atom (not a new one)
If not, a new atom is made with the appropriate pname and entered in the
oblist. Thus ⊗implode is the same as applying ⊗readatom to the
pname computed from the character list. It behaves more like a function
than do ⊗maknam and ⊗gensym.
⊗explode is essentially an inverse to ⊗implode. It takes an atom
and returns the character list corresponding to the pname of the atom.
One reason for using ⊗gensym and ⊗maknam has to do with the fact that
an atomic symbol that is put on the oblist never goes away (unless you
explicitly ⊗⊗remob⊗ it), while atoms generated by ⊗gensym and
⊗implode can be garbage collected as soon as no one is pointing to them
any more. Thus if you have a progran that is going to
generate a lot of temporary symbols which will not be wanted after some
it is efficient spacewise to not put these symbols on the oblist.
If you are confused at this point, some examples of the behaviour
of ⊗maknam, ⊗implode, ⊗gensym, ⊗intern, ⊗remob, and ⊗explode my help.
Your are strongly encourage to go to your nearest LISP and try some
experiments of your own to be sure you understand what is happening.
.begin nofill turnon "∂"
.n←15;m←5;o←55
∂(m)user:∂(n)$$(EQ (MAKNAM '(B A M)) (MAKNAM '(B A M)))$
∂(m)LISP:∂(n)$$NIL $
∂(m)user:∂(n)$$(EQ (IMPLODE '(B A M)) (IMPLODE '(B A M)))$
∂(m)LISP:∂(n)$$T $
∂(m)user:∂(n)$$(SETQ X (MAKNAM '(F O O)))$
∂(m)LISP:∂(n)$$FOO $
∂(m)user:∂(n)$$(EXPLODE X)$
∂(m)LISP:∂(n)$$(F O O) $
∂(m)user:∂(n)$$(EXPLODE 'FOO)$
∂(m)LISP:∂(n)$$(F O O) $
∂(m)user:∂(n)$$(EQ X (MAKNAM '(F O O)))$ ∂(o);⊗maknam makes a new atom
∂(m)LISP:∂(n)$$NIL $
∂(m)user:∂(n)$$(EQ X (IMPLODE '(F O O)))$ ∂(o);⊗maknam didn't intern ⊗x
∂(m)LISP:∂(n)$$NIL $
∂(m)user:∂(n)$$(EQ 'FOO (IMPLODE '(F O O)))$∂(o); ⊗readatom interns $FOO
∂(m)LISP:∂(n)$$T $
∂(m)user:∂(n)$$(EQ X 'FOO)$
∂(m)LISP:∂(n)$$NIL $
∂(m)user:∂(n)$$(SETQ Y (IMPLODE '(F O X)))$
∂(m)LISP:∂(n)$$FOX $
∂(m)user:∂(n)$$(EQ Y (IMPLODE '(F O X)))$∂(o);⊗implode interned ⊗y
∂(m)LISP:∂(n)$$T $
∂(m)user:∂(n)$$(EQ Y (INTERN Y))$
∂(m)LISP:∂(n)$$T $
∂(m)user:∂(n)$$(SETQ Y (MAKNAM '(F O X)))$
∂(m)LISP:∂(n)$$FOX $
∂(m)user:∂(n)$$(EQ Y (INTERN Y))$∂(o);an atom having same pname
∂(m)LISP:∂(n)$$NIL $ ∂(o);but not qeq ⊗y is on the oblist
∂(m)user:∂(n)$$(REMOB Y)$
∂(m)LISP:∂(n)$$NIL $
∂(m)user:∂(n)$$Y$ ∂(o);⊗y is still a prefectly good atom
∂(m)LISP:∂(n)$$FOX $
∂(m)user:∂(n)$$(EQ Y (IMPLODE '(F O X)))$∂(o); but not on the oblist any more
∂(m)LISP:∂(n)$$NIL $
∂(m)user:∂(n)$$(SETQ X (GENSYM))$
∂(m)LISP:∂(n)$$G0123 $
∂(m)user:∂(n)$$(EQ X 'G0123)$∂(o);⊗gensym also makes new
∂(m)LISP:∂(n)$$NIL $ ∂(o);uninterned atoms
.end
.bb List structure.
Now that we can build and recognize atoms, the next step is
to be able to construct representations of arbitrary S-expressions.
Thus we need to represent list structures and implement the operations
⊗car, ⊗cdr, and ⊗cons. LISP sets aside a portion of memory available
to it in an area called list space. This space contain of cons-cells.
As explained in Chapter {section READIN}, a cons-cell must be big
enough to hold two pointers
and could be a single machine word with the two halves being pointers
to the a- and d- parts or if the size of a word is too small, a pair of words.
In either case ⊗car and ⊗cdr can be fairly simply operations (in the case
of the usual PDP-10 implementation a single machine instruction).
At any particular time some of the list space will be in use and
some of it will be free.
The free space is linked into a single list called the
"free storage list" and LISP keeps a pointer to the beginning of that list.
When a ⊗cons is executed, a word is removed from the free storage list,
the a-part and the d-part are filled in and a pointer to this word is
returned. When there is no more free storage left (more accurately when
the amount left reaches some preset minimal amount) then LISP attempts to
obtain more free space. One way is to request it from the host system, but
this is undesirable in general, since a lot of the list space is taken up
by structures which are no longer accessible and may as well be "recycled".
This recycling process is done by what is known as a "garbage collector".
Thus when ⊗cons discovers that it is out of space, it calls the
garbage collecter to find some. The garbage collector must first
find out which parts of list space are in use (accessible) and
which are not. This is known as the marking phase.
In the sweep phase it collects the unused space, adding the recovered
cons-cells to the free storage list.
If it succeeds in finding unused space, ⊗cons can proceed,
if not the computation usually grinds to a halt, printing some suitable
error message.
To determine what parts of list structure are currently accessible
the garbage collector starts with all of the immediately accessible
parts and follows all possible ⊗car and ⊗cdr chains until reaching an
atom or something previously seen (marked). The immediately accessible
parts of list structure include the property lists of all atoms
(including the contents of value cell in case this is kept separately),
structures on whatever stacks LISP is using for control, contents of
any list type arrays, plus probably a few special locations or registers
that LISP may use for special purposes.
There are a number of algorithms of doing the marking and collection of
unused space. It is a technique generally useful in situations where
storage gets allocated and reclaimed. For more details the reader is
referred to Knuth[1968a] or Baker[1978] and references therein.
.bb Other LISP objects.
The structures and operations we have discussed above are sufficient
for a LISP system that is only going to deal with interpreted code and will
only allow S-expressions as data for programs.
Most systems provide at least two additional features.
One is the ability to run compiled code, and the other is the addition of
arrays to the allowed types of data. Thus a space known as binary program
space is usually set aside for storing compiled code. Sometimes arrays
are kept in the same space, and sometimes they are kept in an additional
separate space. In any event, this means that there are two additional
objects that some of the basic LISP routines must recognize and deal with.
One is the "subr" object which is the address of a compiled program and
is associated with the atom naming the program via the $SUBR property.
The other is an array descriptor which will be associated via the
$ARRAY property with the atom naming that array.
In addition to the structures that make up the data manipulated
by a LISP program, the LISP system must maintain structures describing
the current state of a computation. In general there will be one
or more stacks (often called Push-Down-Lists or pdls) where information is
kept as to current and/or old variable bindings (binding contexts or
environments), knowlege about the expression currently being evaluated,
and what to do next (control stack).
Information on these stacks is accessed via pdl-pointers, context pointers
etc..
The LISP system usually provides programs to allow the user to examine
control stacks and environments. We will say something about this later.
.cb Exercises
.item ← 0
#. Write a program ⊗saveup that takes a list of atoms and a list of
properties and collects for each atom and association list of
property/value pairs. ⊗saveup returns a list of atom/alist pairs.
#. Write a program ⊗removem that takes the same arguments as
⊗saveup and removes all the properties from the property lists
of all the atoms.
#. Write a program ⊗restorem that takes a list of the form output
by ⊗saveup and puts back all of the properties on the property lists
of all the atoms.
The above collection of functions are useful for switching environments
in a system written in LISP.
.ss(top,The Top level of LISP.)
The top level of LISP can be described in terms of the programs
⊗read, ⊗eval and ⊗print for reading, evaluating and printing S-expressions.
Namely,
⊗⊗⊗toplevel[] ← qprog[[] top: print eval read[]; qgo top]⊗.
The ⊗read program gets from the input stream
the next string of tokens corresponding to the external representation
of an S-expression and constructs the corresponding list structure representation.
⊗eval computes the value denoted by the expression in the current environment,
and ⊗print converts a list structure into the external representation of
the corresponding S-expression and sends it to the output stream.
In this Chapter we will present programs that embody the recursive
structure of the process of reading and printing S-expressions in both
of the notations (dot and list) discussed in Chapter {section READIN}.
The "slashifying" of special characters is also discussed.
We have already discussed how a simplified version of ⊗eval works
(Chapter {subsection evaluator!}). Here we discuss some of the decisions
that must be made in desigining a LISP interpreter, how they effect
the results computed by your programs, and how LISP uses the "system properties"
stored on the property lists of atoms.
.bb Reading and Printing: "dot" notation
Consider first reading and printing S-expressions in "dot" notation.
Rather than deal with the issue of representing character strings,
we read an print a listified external notation which is a list of atoms.
The special atoms $LP, $RP and $DOT represent the
delimiter tokens "(", ")" and "." respectively and all other atoms
represent themselves.
Let ⊗DOTLIST be the sort of lists that correspond to "dot" notation
representations of S-expressions. They can be characterized as follows:
(i) ⊗⊗< $$<other-atom>$ >⊗ is a ⊗DOTLIST representing the atom $$<other-atom>$,
where $$<other-atom>$ is any atom not in the list of special atoms $LP, $DOT, $RP.
(ii) if ⊗da and ⊗dd are ⊗⊗DOTLIST⊗s then
⊗⊗<$$LP$> * da * <$$DOT$> * dd * <$$RP$>⊗ is a ⊗DOTLIST representing ⊗[x_._y]
where ⊗da represents ⊗x and ⊗dd represents ⊗y.
(iii) Those are all of the lists of sort ⊗DOTLIST.
The program ⊗prindot "print"s a list of sort ⊗DOTLIST representing
the S-expression given as an argument.
The definition of ⊗prindot is:
.begin nofill turnon "∂" select 2
.n←16
.group
!fcnprindot& ∂(n)prindot[e] ← prina[e,qNIL]
∂(n)prina[e, l] ←
!fcnprina& ∂(n+4) qif qat e qthen e . l
∂(n+4) qelse $$LP$ . prina[qa e, $$DOT$ . prina[qd e, $$RP$ . l]]
.end
thus
.begin nofill turnon "∂"
.n←16
.group
∂(n)⊗⊗prindot[$$(PLUS (TIMES A B) C)$] = ⊗
∂(n+8)$$(LP PLUS DOT LP LP TIMES DOT LP A DOT LP B $
∂(n+8)$$ DOT NIL RP RP RP DOT LP C DOT NIL RP RP RP),$
.end
Notice that the recursive structure of ⊗prindot is very much like
that of ⊗flatten ({eqn flatten!!}). It simply inserts the delimiter characters
in the appropriate places.
The program ⊗readdot "read"s a list of atoms. If the list is
of the form ⊗⊗dl * u⊗ for some ⊗DOTLIST, ⊗dl and list ⊗u
then the result is ⊗⊗e_._u⊗ where ⊗e is the S-expression
represented by ⊗dl. Otherwise the atom $READ-ERROR is returned.
The definition of ⊗readdot is:
.begin nofill turnon "∂" select 2
.turnoff "{}"
.n←5
.m← 55
.group
∂(n)readdot u ←
∂(n+8)qif qat u qthen $$READ-ERROR$
∂(n+8)qelse qif qa u qequal $$LP$ qthen
!fcnreaddot& ∂(n+12){readdot qd u}[λw: ∂(m)%1;read a-part%*
∂(n+16)qif qat w ∨ ¬[qad w qequal $$DOT$] qthen $$READ-ERROR$∂(m)%1;check for error%*
∂(n+16)qelse {readdot qdd w}[λv: ∂(m)%1;read d part%*
∂(n+20)qif qat v ∨ ¬[qad v qequal $$RP$] qthen $$READ-ERROR$∂(m)%1;check for error%*
∂(n+20)qelse [qa w . qa v] . qdd v]]∂(m)%1;⊗cons up the result%*
∂(n+8)qelse u
.turnon "{}"
.end
thus
.begin nofill turnon "∂"
.n←16
.group
∂(n)⊗⊗qa readdot[$$(LP PLUS DOT LP LP TIMES DOT LP A DOT LP B $⊗
∂(n+10)⊗⊗$$ DOT NIL RP RP RP DOT LP C DOT NIL RP RP RP)$] = ⊗
∂(n+8)$$(PLUS (TIMES A B) C)$.
.end
.bb Reading and Printing: mixed "list-dot" notation
Now we treat the more general mixed "list-dot" notation.
Let ⊗LDOTLIST be the sort of lists that correspond to "list-dot" notation
representations of S-expressions. They can be characterized as follows:
(i) ⊗⊗< $$<other-atom>$ >⊗ is a ⊗LDOTLIST representing the atom $$<other-atom>$,
where $$<other-atom>$ is any atom not in the list of special atoms $LP, $DOT, $RP.
if ⊗di ⊗⊗1≤i≤n⊗ and ⊗dd are ⊗⊗LDOTLIST⊗s representing the S-expressions ⊗xi and ⊗xd
then
(iia) ⊗⊗<$$LP$ $$RP$>⊗ is an ⊗LDOTLIST representing $$()$ or qNIL.
(iib) ⊗⊗<$$LP$> * d1 * ... * dn * <$$RP$>⊗ is an ⊗LDOTLIST representing ⊗⊗<x1_..._xn>⊗
(iic) ⊗⊗<$$LP$> * d1 ... dn * <$$DOT$> * dd * <$$RP$>⊗ is a ⊗DOTLIST representing
⊗⊗[x1_._[_..._[xn_._xd]_..._]]⊗
(iii) Those are all of the lists of sort ⊗DOTLIST.
The program ⊗prinlis "print"s an S-expression in "list" notation.
This is the extreme form of list-dot notation where ⊗dd following $DOT
in clause (iic) must be an atom.
To print non-atomic S-expression in this style, start by printing $LP, then
⊗cdr through the S-expression printing the ⊗car until an atom is reached.
If this atom is qNIL the S-expression is indeed
a list and the printing is terminated with $RP. Otherwise the printing
is terminated with $DOT followed by the atom followed by $RP.
⊗prinlis is defined by:
.begin nofill turnon "∂" select 2
.n←16
.group
!fcnprinlis& ∂(n)prinlis[e] ← prinb[e,qNIL]
∂(n)prinb[e, l] ←
∂(n+4) qif qat e qthen e . l
!fcnprinb& ∂(n+4) qelse $$LP$ . [qif qn qd e qthen prinb[qa e, $$RP$ . l]
∂(n+12)qelse qif qat qd e qthen prinb[qa e, $$DOT$ . [qd e . [$$RP$ . l]]]
∂(n+12)qelse prinb[qa e, qd prinb[qd e, l]]]
.end
thus
.begin nofill
⊗⊗⊗prinlis[$$(PLUS (TIMES A B) C)$] = $$(LP PLUS LP TIMES A B RP C RP).$⊗
and
⊗⊗⊗ prinlis[$$(A . ( B . (C . D)))$] = $$(LP A B C DOT D RP).$⊗
.end
⊗prinlis also has the same basic structure as ⊗prindot, but
makes some intermediate tests in order to omit all but the essential
delimiter characters.
The program ⊗read gobbles up an initial segment of a list ⊗dl_*_u
and returns ⊗[e_._u] where ⊗e is the S-expression represented by ⊗dl in
list-dot notation.
The auxiliary program ⊗reada does the work. It the uses the auxiliary
variable ⊗l to stack partial results.
The basic idea is the following: if
the next atom is $LP then and gobble up 0 or more S-expressions from the
remaining list, stacking them on ⊗l, until a delimiter ($DOT or $RP) is reached.
If the delimeter is $RP then the result is the ⊗reverse of ⊗l.
If the delimeter is $DOT then gobble up one more S-expression and
return ⊗l reversed onto that.
The definitions of ⊗read and ⊗reada are as follows:
.begin nofill turnon "∂" select 2 turnoff "{}"
.n←16
.group
!fcnread& ∂(n)read u ← qif qa u qequal $$LP$ qthen reada[qd u, qNIL] qelse u
∂(n)reada[u, l] ←
∂(n+4)qif qn u qthen rev1[l, $$ERROR$] . qNIL
!fcnreada& ∂(n+4)qelse qif qa u qequal $$RP$ qthen reverse l . qd u
∂(n+4)qelse qif qa u qequal $$LP$ qthen {reada[qd u, qNIL]}[λw: reada[qd w, qa w . l]]
∂(n+4)qelse qif qa u qequal $$DOT$ qthen {reada[qd u, qNIL]}[λw: rev1[l, qaa w] . qd w]
∂(n+4)qelse reada[qd u, qa u . l]
!fcnrev1& ∂(n)rev1[u, v] ← qif qn u qthen v qelse rev1[qd u, qa u . v]
.end
thus
.begin nofill turnon "∂"
.n←16
.group
∂(n)⊗⊗qa read[$$(LP PLUS LP TIMES A B RP C RP)$=$$(PLUS (TIMES A B) C)$ ⊗
.end
We note that ⊗read and ⊗reada don't check the syntax completely, so
some non-wellformed lists will be "read" without complaint, although
the result may be strange.
We end the discussion of reading and printing with some remarks
on the treatment of special characters in LISP.
Implicit in the functioning of the LISP scanner in the fact that
certain characters such as the delimiters "(", ")", "." and " " (<space>)
are treated specially and cannot directly be included in the names of
atoms. LISP systems usually provide a way of including these
special characters in an atom name by designating an additional
special character to serve as an escape character. Thus any
character following the escape character in an input string is treated
as an ordinary letter. This is ofter called "slashifying" as the designated
escape character is often the "/" character.
Thus $(AB/.D) is a list of one element, the atom whose pname is "$$AB.D$"
whereas $(AB.D) is a pair, the a-part being $AB and the d-part being $D.
One result of this "slashifying"
convention is that many of the pname manipulating and printing functions have
several flavors according to whether the name or the slashified version is being
dealt with. We mention the fact only to make you aware of it. You should
consult your LISP manual for further details. You should also be aware of the
fact that "/" itself (or whatever the designated escape character is)
being a special character is gobbled up by the LISP scanner,
thus if you really want the symbol "/" you must type "//".
.bb Evaluation
In LISP systems the evaluator doesn't require an a-list
(environment) argument. Instead it uses the property lists
and environment (binding context) stacks or some equivalent
structure to store the corresponding information.
The simple ⊗eval was given as a recursive function and thus much
of the control structure is automatic in the recursion structure.
Actual interpreters are implemented iteratively, and use control
stacks to keep track of what they are doing and what is left to be done.
There are several important decisions to be made in the design of
an interpreter. These include the mechanism for making, saving and
restoring variable bindings, how to evaluate or interpret the
expression occuring in functional position of an application term,
when and how to bind the free variables in function expressions.
(This relates to the infamous "funarg" problem.)
If you only write simple clean pure LISP programs in which
no free variables occur in function expressions then these decisions
probably won't effect the meaning of your programs. However, most
programs are like that.
In the following we discuss some possible decisions and explain the use of
system properties.
You should consult the manual for LISP you plan to use to find out
what decisions have been made in designing the interpreter.
A detailed analysis of binding and control mechanisms
in LISP can be found in Allen[1979].
A very nice iterative description of a LISP like interpreter is the SCHEME
interpreter given in Sussman and Steele[1978]. It also shows how control and
environment stacks are handled in a particularly nice manner.
When LISP begins to evaluate the application of a λ-expression
to a list of arguments it must associate the variables named in
the variable list of the expression with the values of the corresponding
arguments. Since this evaluation may occur in the midst of the evaluation
of a bigger expression where the same variable names occur in the outer
expression, the old values must be saved and restored when the current
evaluation is completed. Similarly when a qprogram is entered the
current values of the qprog variables must be save and restore when
the qprog is exited. (The interpreter may initialize these variables
or leave them unbound.)
The decisions that must be made include: where to keep the current
binding, how to save and restore bindings, and how to access outer
bindings of variables. These are not independent.
For example the current binding of a variable could be kept always
as the $VALUE property. When ever a new binding is made the old
one is saved on the binding context stack and restored when the
qprog or λ is exited. Another possibility is to consider the
$VALUE property as the top level, global environment and put
new bindings on the stack. Thus access to value is slower, but
restoring is faster.
For expressions occuring in the "function" position of a term,
the interpreter must decide what program is denoted. If the expression
is a λ-expression then the λ-variables are bound to the values of
the arguments and the body is evaluated.
If the expression is an atom there many possible conventions.
One is to always look for an applicative property (as discussed below)
on the property list of the atom and use that value. If there
are several, the reasonable choice is the first one found as
it is the most recent property set.
Another possibility would be to check the local environment first
to see if the atom is bound to an applicable expression. If none is found
then look on the property list.
Some LISPs use the $VALUE property for every thing and in the
case where an applicable object is needed, will repeat the evaluation
process until such an object is found, or it can't go further for some
reason.
The problem of when and how to assign values to the free variables in
a function-expression is the subject of much controversy in the LISP world.
The question is where to "trap" the free-variables. It is a problem
that plagued the logicians in the early days of the development of
formal logic and also of λ-calculus. It arises when you define
what it means to substitute an expression for occurrences of a bound
variable in some other expression. The logicians solution is to
not allow trapping to occur. Namely, outer bound variables are renamed
if they occur free in the expression being substituted. Thus free variables
remain free. In computation there are times when it is convenient to
allow trapping of free-variables. For example the function-expression
passed to a program may just be a piece of code extracted from that
program to make it easier to read and you intend that the same
name in both pieces of code refer to the same thing. The original LISP
interpreter provided maximal trapping in the sense that variables
function-expressions were always assumed to refer the the most recent
binding. (This was more of an oversight than a concious decision.)
Current LISPs usually provide both alternatives to some degree.
In MACLISP evaluating
⊗⊗⊗$$(FUNCTION (LAMBDA (X) (PLUS X Y)))$⊗
produces the undecorated λ-expression
⊗⊗⊗$$(LAMBDA (X) (PLUS X Y))$⊗
which when applied will use the value of $Y at the time of application,
while evaluating
⊗⊗⊗$$(*FUNCTION (LAMBDA (X) (PLUS X Y)))$⊗
produces the "funarg" term
⊗⊗⊗$$(FUNARG (LAMBDA (X) (PLUS X Y)) <current-bcp>)$⊗
where $<current-bcp> is a pointer to the current binding context. When applied
the context indicted by the 3rd element of a funarg is used instead of the
binding context in force at application time. There are many subtilties
involved in solving the "funarg" problem. For example, what if the binding
context referred to in a MACLISP funarg expression has disappeared before
it is used? This is a "funval" problem in the sense that it is when the
funarg expression is passed up as a value of a computation to be used later
the binding context is likely to have disappeared (popped from the stack).
We won't pursue the problem further.
[References?]
We conclude the discussion of evaluation with a brief description
of the use of some of the standard system propertys other that $PNAME and
$VALUE. An atom with an $EXPR property is assumed by LISP to denote an
interpretable program and the value of the $EXPR property
should be an application term (such as a program name, a λ-expression,
or a label-expression) suitable for being interpreted.
Recall that this property can be set
using one of the operations $DEFUN or $DEFPROP described
in Chapter {section READIN}. An atom with a $SUBR property denotes a compiled
program which may be directly executed. The value of the $SUBR property is
a subroutine pointer which points to the compiled program. Atoms denoting
built in program such as $REVERSE or $APPEND have a $SUBR property
automatically. When a compiled program is loaded into LISP the name of
that program gets a $SUBR property. The properties $FEXPR and $LEXPR
are similar to $EXPR except that the arguments to the program are treated
differently. In the $EXPR case a fixed number of arguments are expected
and they are evaluated before executing the program. In the $FEXPR case
the list of arguments (eg. the d-part of the expression)
is passed unevaluated to the program as a single entity.
In the $LEXPR case the number of arguments is arbitrary, the arguments
are evaluated and saved (typically on the stack),
the number of arguments is passed to the program
and the program must use a special auxiliary program ⊗args to access the argument
values. The three $SUBR cases are analogous to those for $EXPR.
We can express the difference between $DEFPROP and $PUTPROP explained above
by saying that $DEFPROP is an $FSUBR and $PUTPROP is a $SUBR.
If an atom has a $MACRO property then that property should be
a application term taking one argument. When such an atom is encountered
as the a-part of an expression being evaluated, the corresponding macro definition
is applied to the entire expression, as the single argument, and the value is
a new expression that is evaluated in place of the original one.
An atom can be given the $MACRO property using $DEFUN with type $MACRO
specified or some other macro-definition facility provided by the system
or defined by the user.
See Chapter {subsection macro!} for more details.
An atom with an $ARRAY property denotes an array. The value
of the $ARRAY property is an array pointer which provides LISP with
a means of accessing the array. The $ARRAY property can be set using
the $ARRAY command as explained in Chapter {subsection array!}.
The array pointer includes type, bounds, and location information.
.cb Exercises
.item ← 0
#. Modify ⊗read and ⊗reada to do some syntax checking and return
"error messages" is case of an ill-formed argument.
#. Rewrite ⊗readdot as a sequential program with out any recursive calls.
.ss(edit,Editing LISP programs.)
An important part of any LISP system is an interacitve editor.
With the help of the editor you can write and modify programs.
Some editors allow you to evaluate expressions with out leaving
the editing environment. They may also provide facility for
editing S-expressions other than programs. The fact that
LISP programs are S-expressions with a very simple syntax means that
it is easy to write a simple but powerful LISP program editor in LISP.
In this section we will present a simple interactive editor
based on the MACLISP editor. Before describing this program let us
consider what we want from a program editor.
First it should be able to lookup the current program given its name
and be able to replace it by a modified version.
Second it should be able to move around in the program being edited
thus changing its notion of the ⊗current ⊗expression (known as
the $$CE$). Third it should be able to insert and delete
sub-expressions (elements of the $$CE$). This is sufficient capability
to convert any program into any other. It is also somewhat essential
to be able to look at the $CE, thus the editor should be able to
print the $CE. Other features which are
useful include moving parentheses "in and out", wrapping a subsequence
of a list into a list -- e.g. inserting a par of parentheses,
or unwrapping an element of a list and splicing its elements into
the containing list -- e.g. erasing a pair of parentheses,
finding the first occurrence of some expression -- e.g. changeing the
$CE to be that occurrence or maybe the list containing that occurrence,
replacing all occurrences of an expression by another expression --
either at the top level of the $CE or at all levels, transposing
members of the $CE, moving an expression from on place to another,
or reversing the $CE. These are only a few ideas.
We implement a representative selection. The reader should be able
to implement any others without much difficulty.
Some editors also have file manipulation capability,
so old definitions can be looked up in
files and new ones saved or used to update a file.
We will not treat this aspect of editing.
The program ⊗editor takes program name as an argument. When invoked,
⊗editor looks up the program by getting the $EXPR property,
initializes the global parameters ⊗fn ⊗top, ⊗ce and ⊗chain
and then goes into a command reading - executing loop.
The parameter ⊗fn is just to save the program name. ⊗top is the program
being edited. ⊗ce the current expression (⊗top initially).
⊗chain the chain of expressions leading from ⊗top to ⊗ce.
If ⊗top=ce the ⊗⊗chain=qNIL⊗ otherwise the first element of ⊗chain
is a pair ⊗[n_._e] where ⊗e is the expression immediately containing
⊗ce and ⊗n is the position of ⊗ce in ⊗e.
There are two kinds of commands those handled directly by
the editor and those which the editor uses to look up what should be
done. The direct commands are $Q, $OK, or $<n> for any number $n.
$Q means exit the editor, $OK means replace the old version of the
program being edited by the new one and then exit.
$<n> mean replace the ⊗ce by the nth element of the ⊗ce.
There are two type of lookup commands, atomic and list.
For atomic commands the editor looks up the program to execute
on the $ATOMIC-EDIT-FNS property of that atom and runs it.
For list commands the editor looks up the program to execute
on the $LIST-EDIT-FNS property of ⊗⊗qa command⊗ and
applies it to ⊗⊗qd command⊗.
If in either case the property is not found then the command is
evaluated as an ordinary term and the result printed.
This style of programming is sometimes called "data directed". It
makes the editor extremely easy to extend or modify. Since the
data structures manipulated are quite simple and easily explained,
it can even be explained to a user in sufficient detail that the
user can implement his own favorite commands.
The reason that the numeric commands are treated
directly is that numbers don't have property lists.
The editor could be made more uniform by giving these commands
symbolic names, but moving down into an expression via such
commands is sufficiently convenient to make it work implementing the
special case.
We take some care in the editor and command codes
to make sure the command will work in the context it has been given.
If not an error message is printed and no action occurs.
Many of the commands work destructively. This is safe because the expression
to edit is a "copy" of the original definition. In fact one might
consider it essential to edit destrictively. Since otherwise, after each
change the expression would have to be rebuilt. At least it is in the
spirit of editing to make destructive changes, since the point of editing
in to change the expression.
The program ⊗editor and its auxiliary functions are defined as follows.
.begin nofill turnon "∂"
.n←8
∂(n)⊗⊗ editor l (FEXPR) ← ⊗
∂(n)⊗⊗ qprog [fn, top, ce, chain, command, efn]⊗
∂(n)⊗⊗ fn ← qa l⊗
∂(n)⊗⊗ top ← copy get[fn, $$EXPR$]⊗
∂(n)⊗⊗ qif qn top qthen [errmsg0[]; qreturn $$NO-EDIT$]⊗
∂(n)⊗⊗ ce ← top; chain ← qNIL⊗
∂(n)⊗⊗ $$EDLOOP:$ ⊗
∂(n)⊗⊗ print $$ε$⊗
∂(n)⊗⊗ command ← read[]⊗
∂(n)⊗⊗ qif command = $$Q$ qthen qreturn $$BYE$⊗
∂(n)⊗⊗ qelse qif command = $$OK$ qthen qreturn putprop[fn, top, $$EXPR$]⊗
∂(n)⊗⊗ qelse qif numberp command qthen ⊗
!fcneditor& ∂(n)⊗⊗ [qif qat ce ∨ command > length ce qthen [errmsg1[], qgo $$EDLOOP$], ⊗
∂(n)⊗⊗ chain ← [command . ce] . chain, ⊗
∂(n)⊗⊗ ce ← nth[ce, command], ⊗
∂(n)⊗⊗ qgo $$EDLOOP$]⊗
∂(n)⊗⊗ qif qat command qthen efn ← get[command, $$ATOMIC-EDIT-FN$]⊗
∂(n)⊗⊗ qelse efn ← get[qa command, $$LIST-EDIT-FN$]⊗
∂(n)⊗⊗ qif efn = qNIL qthen [print eval command, qgo $$EDLOOP$]⊗
∂(n)⊗⊗ qelse qif qat command qthen [eval efn, qgo $$EDLOOP$]⊗
∂(n)⊗⊗ qelse qif ¬qat command qthen ⊗
∂(n)⊗⊗ [apply[efn, qd command], qgo $$EDLOOP$]⊗
∂(n)⊗⊗ errmsg0[] ← [print fn, princ $$`not an EXPR'$]⊗
∂(n)⊗⊗ errmsg1[] ← [terpri[], princ $$`> length of CE'$]⊗
∂(n)⊗⊗ errmsg2[] ← [terpri[], princ $$`Unknown command'$]⊗
∂(n)⊗⊗ errmsg3[] ← [terpri[], princ $$`You are at the top'$]⊗
∂(n)⊗⊗ errmsg4[] ← [terpri[], princ $$`You are at the left edge'$]⊗
∂(n)⊗⊗ errmsg5[] ← [terpri[], princ $$`You are at the right edge'$]⊗
∂(n)⊗⊗ errmsg6[] ← [terpri[], princ $$`CE is atomic'$]⊗
!fcnnth&ed ∂(n)⊗⊗ nth[u, n] ← qif n > 1 ∧ ¬qn qd u qthen nth[qd u, sub1 n] qelse qa u⊗
!fcnpos&ed ∂(n)⊗⊗ pos[u, n] ← qif n > 1 ∧ ¬qn qd u qthen pos[qd u, sub1 n] qelse u⊗
!fcncopy&ed ∂(n)⊗⊗ copy x ← qif qat x qthen x qelse copy qa x . copy qd x⊗
.end
.begin nofill
We have implemented the following atomic commands:
$P ; print the ⊗⊗ce⊗,
$UP ; move up one level in the chain,
$RT ; move one to the right,
$LI ; move the left parenthesis in one
thus removing the first element of the ⊗ce and inserting it before the ⊗⊗ce⊗), and
$RO ; move the right parenthesis out one
thus making the right sibling of the ⊗ce the last element of the ⊗⊗ce⊗.
.end
The expressions put on the $ATOMIC-EDIT-FN properties are shown below.
If you have trouble figuring out how some of the pointer manipulations
work we suggest drawing a list structure (boxes and pointers) of some
$CE and following through the sequence of assignments.
.begin nofill TURNON "∂"
.n←5;m←52
!fcnp&cmd ∂(n)⊗⊗ p:$$ATOMIC-EDIT-FN$ ← print ce⊗ ∂(m);print the ⊗ce
.group
∂(n)⊗⊗ up:$$ATOMIC-EDIT-FN$ ← qprog []⊗ ∂(m);⊗ce ← parent expression
!fcnup&cmd ∂(n)⊗⊗ qif qn chain qthen qreturn errmsg3[]⊗
∂(n)⊗⊗ ce ← qda chain⊗
∂(n)⊗⊗ chain ← qd chain⊗
.apart
.group
∂(n)⊗⊗ rt:$$ATOMIC-EDIT-FN$ ← qprog [n]⊗ ∂(m);⊗ce ← expression to right
∂(n)⊗⊗ qif qn chain qthen qreturn errmsg3[]⊗
!fcnrt&cmd ∂(n)⊗⊗ n ← add1 qaa chain⊗
∂(n)⊗⊗ qif n > length qda chain qthen qreturn errmsg5[]⊗
∂(n)⊗⊗ ce ← nth[qda chain, n]⊗
∂(n)⊗⊗ qaa chain ← n⊗
.apart
.group
∂(n)⊗⊗ li:$$ATOMIC-EDIT-FN$ ← qprog [cetmp, pos]⊗ ∂(m);move left paren in
∂(n)⊗⊗ qif qn chain qthen qreturn errmsg3[]⊗
∂(n)⊗⊗ qif qat ce qthen qreturn errmsg6[]⊗
!fcnli&cmd ∂(n)⊗⊗ pos ← pos[qda chain, qaa chain]⊗ ∂(m);point to ⊗ce
∂(n)⊗⊗ qd pos ← qd ce . qd pos⊗ ∂(m);insert ⊗⊗qd ce⊗ after ⊗ce
∂(n)⊗⊗ qa pos ← qa ce⊗ ∂(m);replace ⊗ce by ⊗⊗qa ce⊗
∂(n)⊗⊗ ce ← qd ce⊗
∂(n)⊗⊗ qaa chain ← add1 qaa chain⊗
.apart
.group
∂(n)⊗⊗ ro:$$ATOMIC-EDIT-FN$ ← qprog [cetmp, pos, pos1]⊗ ∂(m);move right paren out
∂(n)⊗⊗ qif qn chain qthen qreturn errmsg3[]⊗
∂(n)⊗⊗ qif qat ce ∧ ¬qn ce qthen qreturn errmsg6[]⊗
∂(n)⊗⊗ pos ← pos[qda chain, qaa chain]⊗
∂(n)⊗⊗ pos1 ← qd pos⊗
!fcnro&cmd ∂(n)⊗⊗ qif qn pos1 qthen qreturn errmsg5[]⊗
∂(n)⊗⊗ cetmp ← qNIL . ce⊗ ∂(m);in case ⊗ce is qNIL
∂(n)⊗⊗ nconc[cetmp, pos1]⊗ ∂(m);add next element to end of ⊗ce
∂(n)⊗⊗ qd pos ← qd pos1⊗ ∂(m);delete it from parent list
∂(n)⊗⊗ qd pos1 ← qNIL⊗
∂(n)⊗⊗ ce ← qd cetmp⊗
∂(n)⊗⊗ qa pos ← ce ⊗ ∂(m);make sure new ⊗ce is in parent list
.apart
.end
.begin nofill
We have implemented the following list commands:
($I <n> <exp>) ; insert <exp> before the <n>th element of the ⊗ce,
($D <n>) ; delete the <n>th element of the ⊗ce.
.end
The programs for executing these commands are given by
.begin nofill turnon "∂"
.n←8;m←52
.group
∂(n)⊗⊗ i:$$LIST-EDIT-FN$ ← ⊗ ∂(m);insert ⊗x before the ⊗⊗n⊗th element
∂(n)⊗⊗ λn x.[qprog [pos, tmp]⊗
∂(n)⊗⊗ qif n < 0 qthen qreturn errmsg2[]⊗
∂(n)⊗⊗ qelse qif n = 1 qthen ⊗
!fcni&cmd ∂(n)⊗⊗ qif qn chain qthen top ← [ce ← x . ce]⊗
∂(n)⊗⊗ qelse [ce ← x . ce ; ⊗
∂(n)⊗⊗ ______ qa pos[qda chain, qaa chain] ← ce]⊗
∂(n)⊗⊗ pos ← pos[ce, sub1 n]⊗ ∂(m);point to position for insertion
∂(n)⊗⊗ tmp ← x . qNIL⊗ ∂(m);make list containing ⊗x
∂(n)⊗⊗ qd tmp ← qd pos ⊗ ∂(m);splice it in
∂(n)⊗⊗ qd pos ← tmp ] ⊗
.apart
.group
∂(n)⊗⊗ d:$$LIST-EDIT-FN$ ← ∂(m); delete the ⊗⊗n⊗th element of the ⊗ce
∂(n)⊗⊗ λn.[qprog [pos, tmp]⊗
∂(n)⊗⊗ qif n < 0 qthen qreturn errmsg2[]⊗
∂(n)⊗⊗ qelse qif n = 1 qthen ⊗
!fcnd&cmd ∂(n)⊗⊗ qif qn chain qthen top ← [ce ← qd ce]⊗
∂(n)⊗⊗ qelse [ce ← qd ce ; ⊗
∂(n)⊗⊗ _____ qa pos[qda chain, qaa chain] ← ce]⊗
∂(n)⊗⊗ qelse qif n > length ce qthen qreturn errmsg2[]⊗
∂(n)⊗⊗ pos ← pos[ce, sub1 n]⊗
∂(n)⊗⊗ qd pos ← qdd pos ]⊗
.apart
.end
.cb Exercises
.item←0
#. Write programs to execute the atomic commands $LO, and $RI (see the
description of $RO and $$LI$).
#. Modify the list commands to take a negative argment which is interpreted
as count from the end rather that the beginning of the ⊗ce.
#. Write a programs for executing list-commands of the form
($R <e1> <e2>) (which means to replace occurrences of <e1> as elements
of the ⊗ce by <e2>) and
($TR <e1> <e2>) (which means to replace occurrences of <e1> as subexpression
(at any level) of ⊗ce by <e2>). Note that it is advisable to make a new copy
of <e2> for each replacement otherwise in later editing a change to one occurrence
may have the undesired side effects of changing all occurrences.
#. Write a program to execute the command ($MV <m> <n>) which moves the
element in <m>th position to the <n>th position.
#. Consider implementing an $UNDO feature in the editor which would
undo the last command (or last n commands).
One way to do this is to notice that commands that make changes have inverses.
For example of $LI is $LO, of $$(I <n> <exp>)$ is $$(D <n>)$, etc. All we
need is for the editor to setup a list for storing undoing commands and
require that each command that makes a change stack its inverse on the
undo list.
.ss(eror,Error Handling.)
A robust and helpful LISP system will do all it can to keep
you from destroying it. This includes noticing when you have asked
it to evalute an ill formed or illegal expression. In this class
we include attemping to evaluate a variable that hasn't been bound
($$SETQ$ed or bound as a $LAMBDA or $PROG variable),
taking the ⊗car or ⊗cdr of an atom or other non
list structure object such as an array pointer,
attempting to append a non-list to some S-expression,
applying a function to the wrong number of arguments, etc..
When such an error is noticed
LISP will complain, printing a (hopefully informative) message,
and take some appropriate action. Other errors such as an infinite
recursion can't be prevented by inspection of the expression to be
evaluated, but will be noticed when some abnormal state is reached
such as when the control stack overflows or an attempt is made
to read or write in an illegal location. Again LISP will complain
by printing a message saying what abnormal condition was noticed
and take appropriate action. Of course there is always the possibility
of a truly disastrous error from which there is no recovery and here
there is not much to do but restart and proceed with caution.
The question remains as to what to do when an error is noticed.
Typically LISP will enter a state where
a debugging package can take over and help you find out what is amiss.
In this state you will be able to evaluate expressions, possibly back up
the computation some number of steps, have the computation proceed
(after providing whatever information was lacking and caused the error)
or abandon ship and return to the top level to start again.
In order to aid in error recovery and generally
provide the user with the ability to access information about the
state of a computation, LISP provides various means of altering the
normal progress of evaluation and examining and modifying the environment.
One such means is the break loop. Evaluating ⊗⊗break[name,test]⊗ causes
LISP to evaluate ⊗test and if it is not qNIL to go in to a break loop.
The ⊗name argument is printed out as a message and then a pseudo
read-eval-print loop is entered, where the expression read is
tested to see if it is one of a few special "escape from the loop"
commands. If so the loop is exited in the manner indicated by the
command. Otherwise the form is evaled, printed and we are back at the
top of the loop. There are at least two modes of return from a break loop.
One is just to return to the top level, aborting whatever computation
the break occured in, and the other is to resume the computation
(possibly having altered the state of the world before doing so).
To get a complete picture of the state of the computation
it is useful to know the sequence of events that led up to the current
state. This is made possible by stacking such information as evaluation
proceeds. Just how this is done and what information is kept
depends on the particular implementation. For the sake of example
we will base our description on the MACLISP implementation.
Imagine that there is a stack
(called the specpdl) where each time the interpreter begins
to evaluate an S-expression it pushes onto the stack the following
information: the current specpdl pointer, the expression to be evaluated,
an a pointer to the current variable binding environment,
(MACLISPs version of the a-list argument to ⊗eval). The current stack pointer
is then reset to the new stack top and evaluation continues. When evaluation of
the expression is complete that collection of information is popped from
the stack. There are two primitive operations associated with this
stack. One, ⊗evalframe[pdl], returns a list of the information
at the stack position indicated by pdl. Thus we can use this information
to move up and down the stack and evaluate expressions in various
environments. The other operation causes the interpreter to change its
notion about the current top of the stack. This is called ⊗freturn
(for forced return). If ⊗freturn[pdl,exp] is evaluated then the interpreter
behaves as though ⊗pdl were the top of the stack and
the value of ⊗exp were the result of evaluating the expression stored there.
It thus "returns" from that state and proceeds. This is a highly non-local
form of qgo and should be used with caution, however it is most useful
for building error-recovery and debugging routines.
Finally there are primitives for inducing and trapping errors.
The main primitives of interest are the pair of functions ⊗error and
⊗errset. In the simplest case ⊗error takes one argument which
is a message to be printed. Evaluation of ⊗⊗error msg⊗ causes the
value of ⊗msg to be printed and a LISP error to be signaled.
The result of signalling an error is popping up to the nearest ⊗errset
or to the top level of LISP if there is no enclosing ⊗errset.
The value returned is qNIL.
⊗errset has two arguments, the first is an expression to be evaluated
and the second is a flag. If an error occurs while evaluating the expression,
the ⊗errset "traps" the error and returns the value returned by the error
function, otherwise ⊗errset returns
a list of one element, the value of the expression. If the flag is off (eg. if
it is qNIL) then no error messages are printed.
The reason for ⊗errset returning a list containing the value of the successfully
evaluated expression is to distinguish the value qNIL from the atom qNIL which
says that an error has occurred.
An alternate form of error generation
is ⊗err which takes a single argument. Evaluation of ⊗⊗err x⊗ causes
an error to be signaled and the value of ⊗x is returned.
There are many variations and extensions of the above error functions
which you should learn about for your particular implementation of LISP.
In addition to trapping errors, the ⊗err, ⊗errset mechanism can
be used to return from the midst of a computation when the answer has
been found and it is not necessary to process the result further. For example
if you are looking for a particular atom in an S-expression by doing
a recursive search, then if the atom is found, executing ⊗err pops directly
back to the calling ⊗errset without doing the intermediate book keeping
necessary to return through many levels of recursive calls.
MACLISP provides a "structured" form of such non-local returns in the form
of the ⊗catch/throw construct. Each takes two arguments an expression to
evaluate and a tag. ⊗⊗catch[e,tag]⊗ is evaluated by evaluating ⊗e.
If an expression of the form ⊗⊗throw[x,tag1]⊗ is evaluated in the course of
evauating ⊗e and ⊗tag=tag1 then the evaluation of ⊗e is abandonded an the
value of ⊗x is returned as the value of the ⊗catch. If the evaluation of ⊗e
is completed then its value is returned as the value of the ⊗catch.
We will have more to say about this and other mechanisms of controlling
evaluation of expressions in a later chapter.
.ss(debbug,Debugging Aids in LISP.)
In some cases looking at the list of function
calls leading up to an error and at the values of some of the variables
is sufficient to pin point an error in a program, however it is often useful
to have more powerful debugging facilities available. We will look at
two kinds of debugging aids. The first is the sort of debugging
package that modifies function definitions to print information about
the state of the computation and perhaps interact with the user.
This includes such features as tracing and breaking.
The second is the sort of program that will take over after a break loop
has been entered and help you figure out what has happened.
When a function is traced, each time it is called
information is printed out giving the function name and the
values of the arguments. When it returns a value an additional line is
printed out giving the name of the function and the value being returned.
In order to make the output somewhat more readable, the printed
lines could be indented according to the depth of nesting of
calls to a traced function. To trace a function one replaces the
the original function definition with a program that does the
initial printing, applies the original definition to the arguments,
does the final printing and returns the value. This program is
also responsible for updating any global variables that are used to
keep track of the recursion depth of a call.
Thus we could build a small trace package using the following
programs.
.begin nofill turnon "∂"
.n←16
∂(n )⊗⊗trace fn ← ⊗
∂(n+4)⊗⊗qprog [def]⊗
∂(n+8)⊗⊗qif get[fn, $$TRACED$] qthen qreturn $$ALREADY-TRACED$⊗
∂(n+8)⊗⊗def ← get[fn, $$EXPR$]⊗
∂(n+8)⊗⊗qif qn def qthen qreturn $$NOT-DEFINED$⊗
∂(n+8)⊗⊗qif ¬[qa def = $$LAMBDA$] qthen qreturn $$NOT-LAMBDA$⊗
∂(n+8)⊗⊗putprop[fn, qT, $$TRACED$]⊗
∂(n+8)⊗⊗putprop[fn, def, $$!OLDDEF$]⊗
!fcntrace& ∂(n+8)⊗⊗putprop[fn, ⊗
∂(n+14)⊗⊗subst[fn, ⊗
∂(n+18)⊗⊗$$?FN$, ⊗
∂(n+18)⊗⊗subst[qad def, ⊗
∂(n+22)⊗⊗$$?ARGS$, ⊗
∂(n+22)⊗⊗subst[$$LIST$ . qad def, ⊗
∂(n+26)⊗⊗$$*ARGS$, ⊗
∂(n+26)⊗⊗get[$$!TRACE$, $$!PATTERN$]]]], ⊗
∂(n+14)⊗⊗$$EXPR$]⊗
∂(n+8)⊗⊗qreturn $$OK$⊗
∂(n )⊗⊗untrace fn ← ⊗
∂(n+4)⊗⊗qprog [def]⊗
∂(n+8)⊗⊗qif ¬get[fn, $$TRACED$] qthen qreturn $$NOT-TRACED$⊗
∂(n+8)⊗⊗def ← get[fn, $$!OLDDEF$]⊗
!fcnuntrace& ∂(n+8)⊗⊗putprop[fn, qNIL, $$TRACED$]⊗
∂(n+8)⊗⊗remprop[fn, $$!OLDDEF$]⊗
∂(n+8)⊗⊗putprop[fn, def, $$EXPR$]⊗
∂(n+8)⊗⊗qreturn $$OK$⊗
.end
In order for the ⊗trace routine to work we define the $$!PATTERN$ property
by evaluating
.BEGIN nofill SELECT 6
⊗⊗defprop[⊗!TRACE,
(LAMBDA ?ARGS
(PROG (!VAL)
(TERPRI) (MARKS !ILEVEL)
(PRINC (QUOTE Entering )) (PRINC (QUOTE ?FN))
(PROG (ARGL)
(SETQ ARGL (QUOTE ?ARGS))
L1
(COND ((NULL ARGL) (RETURN qNIL)))
(TERPRI) (INDENT (PLUS !ILEVEL 2))
(PRINC (CAR ARGL)) (PRINC (QUOTE = )) (PRINC (EVAL (CAR ARGL)))
(SETQ ARGL (CDR ARGL))
(GO L1))
(SETQ !VAL
((LAMBDA (!ILEVEL)
(APPLY (GET (QUOTE ?FN) (QUOTE !OLDDEF)) *ARGS))
(ADD1 !ILEVEL)))
(TERPRI) (INDENT !ILEVEL)
(PRINC (QUOTE Returning from )) (PRINC (QUOTE ?FN))
(PRINC (QUOTE with )) (PRINC !VAL)
(TERPRI)
(RETURN !VAL))),
!PATTERN⊗⊗]⊗
.END
The programs ⊗indent and ⊗marks are used by a traced program to print
⊗n blanks or ⊗n marks say ">". We give the definition of ⊗indent. The
definition of ⊗marks is similiar.
.begin nofill select 2 turnon "∂"
.n←16
∂(n)indent n ←
∂(n+4)qprog[i]
∂(n+8)i←1
∂(n+4)loop
!fcnindent& ∂(n+8)qif i>n qthen qreturn qNIL
∂(n+8)princ " "
∂(n+8)i←i+1
∂(n+8)qgo loop
.end
Before evaluating a traced function it is necessary to initialize the
global variable $$!ILEVEL$.
For example if we say to LISP
.begin nofill select 6
.indent 10,10
(!TRACE READ)
(!TRACE READA)
(SETQ !ILEVEL 0)
(READ '(LP LP A RP A DOT B RP))
.end
the following will be printed
.begin nofill select 6
.indent 10,10
Entering READ
U = (LP LP A RP A DOT B RP)
>Entering READA
U = (LP A RP A DOT B RP)
L = NIL
>>Entering READA
U = (A RP A DOT B RP)
L = NIL
>>>Entering READA
U = (RP A DOT B RP)
L = (A)
Returning from READA with ((A) A DOT B RP)
Returning from READA with ((A) A DOT B RP)
>>Entering READA
U = (A DOT B RP)
L = ((A))
>>>Entering READA
U = (DOT B RP)
L = (A (A))
>>>>Entering READA
U = (B RP)
L = NIL
>>>>>Entering READA
U = (RP)
L = (B)
Returning from READA with ((B))
Returning from READA with ((B))
Returning from READA with (((A) A . B))
Returning from READA with (((A) A . B))
Returning from READA with (((A) A . B))
Returning from READ with ((A) A . B)
((A) A . B)
.end
The other case of debugging by modification of function definitions
is breaking. The idea is to modify a function by putting "break points"
in the code. The breaks may be conditional or unconditional, they
may be inserted at the beginning or before or after certain expressions in the
code. Thus we might have a program ⊗⊗mkbreaks[fn,clauses]⊗ which saves the
original definition of ⊗fn, sets a flag saying it is broken, makes the modifications
indicated by the list of break clauses, and makes the new definition.
Each break clause describes a position to place a break and the condition
under which the break is to occur (the test expression).
Given the ability to "break" the evaluation of an expression
we now need some useful routines for examining the state of the world.
The functions ⊗evalframe and ⊗freturn (or analogous functions) provide us
with the necessary primitives, but not it the most useful form. A typical
⊗helper program might include operations for moving up and down the
stack, printing the current expression, evaluting expressions in the
current environment, and forcing the broken computation to continue.
Then in a break loop ⊗helper could be invoked, and would go into
a command reading loop. Notice that moving down the stack (backwards in time)
can be done by alternately applying ⊗evalframe and selecting the pdl pointer
from the list. However moving back up is not so easy. The ⊗helper program
will need to maintain its own stack of pdl pointers.
.cb Exercises
.item←0
#. Add some additional features to the tracing program. For example
it would be useful to have only selected arguments to a function printed at
each call. Also printing the value of some additional expression before or after,
(or both) the evaluation the function application. You may think of
other features you would like to have in a tracing package.
#. Write a program that adds break points to function definitions.
You will need to decide how the break points are to be specified.
#. Write code to execute some ⊗helper commands. Use the ⊗evalframe
function described in section qsect{subsection eror}. Assume that qNIL
represents the pointer to the current top of the stack. Use and maintain
the global variables $POINT and $PTLIST which represent the current
position in the stack and the list of stack positions from here to the
top (to allow you to go up as well as down). Implement the commands
$UP <n> and $DOWN <n> for moving up and down the stack <n> levels,
$NEXT <fn> which moves to the next occurrence of a call to <fn>
down the stack, $CE which prints the expression being evaluated
at the current position, $VAL <exp> which evaluates <exp> in the
binding environment corresponding to the current position and prints the
result, and $FREES which finds all free variables in the current expression
and prints the variable names and values in the environment corresponding
to the current position. $FREES should check that the variable is actually
bound in the current environment and if indicate that fact rather that
causing an error by trying to evaluate it.
.ss(machinex,Exercises.)
.item←0
#. Write a program to "pretty print" function definitions.
The idea is to be able to print with a suitable division into
lines and with indentation so that the structure of the program is easier
to discover. The program may be similar to ⊗prinlis. You will need
additional special characters, say $SPACE and $NEWLINE. The
program should print subexpressions on a single line when ever possible,
align arguments in a function call, and pairs of a conditional when
they will not fit on a line. You will probably wish to experiment
to see how much indentation is desired in various cases. You will
probably be able to think of other conventions that will make the
resulting output easier to read.
#. A standard aid in the debugging machine language programs is the
ability to single step to the execution of a program. In this
mode the processor halts after the evaluation of each instruction,
allows the programmer to examine and modify the contents of
regiseters, memory locations, the program counter etc..
We can imagine the process of interpreting and S-expression and being
composed of indivisible steps such and applying primitive functions
building an environment to evaluate the body of a lambda expression,
a branching decision, etc.. Write a program ⊗stepper which
"single-steps" through the evaluation of an expression.
A simple version would prints out each expression as it begins to evaluate it,
and print out values as they are obtained. A fancier version might
go into a read-eval-print loop (maybe a break loop) so that the programmer
can examine the state of the computation, modify things, and then
continue the single steping. You may want options to bail out now, or
evaluate the current expression in fast mode (returning to single stepping
if there is any computation pending.